Fix hidden states and quant kv cache #10854

Li-Z-Q · 2025-07-17T06:12:33Z

支持量化代码返回 hidden_states
支持针对向量模型进行量化加载，包括 weight_only_int8，weight_only_int4 两种方式
支持向量模型量化加载时仅预分配第一层 kv_cache 并在后续计算时进行复用，从而降低显存占用

paddle-bot · 2025-07-17T06:12:41Z

Thanks for your contribution!

…kv_cache

DrownFish19 · 2025-08-13T07:21:38Z

paddlenlp/experimental/transformers/fused_transformer_layers.py

@@ -1481,7 +1481,7 @@ def forward(
        self.pre_process(**kwargs)
        kwargs["cum_offsets"] = cum_offsets

-        if caches is not None:
+        if caches is not None and not kwargs["kv_cache_reuse"]:


这个位置需要判断是否存在kv_cache_reuse，如果不存在给默认值

…kv_cache

Liujie0926 · 2025-08-20T11:21:50Z

[PaddleNLP-CI]任务执行失败，手动验证发现pr代码执行grpo的case会报错。手动复现命令如下：
cd PaddleNLP/llm/alignment/rl/
python reward/reward_server.py >log_reward 2>&1 & #服务启动
export PYTHONPATH=PaddleNLP/:$PYTHONPATH
export PYTHONPATH=PaddleNLP/llm:$PYTHONPATH
python -u -m paddle.distributed.launch --devices "0,1,2,3" run_rl.py ../../config/qwen/grpo_argument.yaml

报错信息
Traceback (most recent call last):
File "/workspace/PaddleNLP/llm/alignment/rl/run_rl.py", line 453, in
main()
File "/workspace/PaddleNLP/llm/alignment/rl/run_rl.py", line 434, in main
train_result = trainer.train(resume_from_checkpoint=checkpoint)
File "/workspace/PaddleNLP/paddlenlp/rl/trainer/ppo_trainer.py", line 1397, in train
generated_batches: List[DataProto] = self.actor_trainer.generate_sequences(
File "/usr/local/lib/python3.10/dist-packages/decorator.py", line 232, in fun
return caller(func, *(extras + args), **kw)
File "/usr/local/lib/python3.10/dist-packages/paddle/base/dygraph/base.py", line 396, in _decorate_function
return func(*args, **kwargs)
File "/workspace/PaddleNLP/paddlenlp/rl/trainer/actor_trainer.py", line 414, in generate_sequences
sequences = self.get_model(False).generate(
File "/workspace/PaddleNLP/paddlenlp/rl/utils/infer_utils.py", line 315, in generate
outputs = policy_predictor.predict(input_ids=input_ids, repeat_num=repeat_num, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/decorator.py", line 232, in fun
return caller(func, *(extras + args), **kw)
File "/usr/local/lib/python3.10/dist-packages/paddle/base/dygraph/base.py", line 396, in _decorate_function
return func(*args, **kwargs)
File "/workspace/PaddleNLP/paddlenlp/rl/utils/infer_utils.py", line 98, in predict
outputs = self.predict_dy_insert(
File "/usr/local/lib/python3.10/dist-packages/decorator.py", line 232, in fun
return caller(func, *(extras + args), **kw)
File "/usr/local/lib/python3.10/dist-packages/paddle/base/dygraph/base.py", line 396, in _decorate_function
return func(*args, **kwargs)
File "/workspace/PaddleNLP/paddlenlp/utils/import_utils.py", line 105, in wrapper
return func(self, *args, **kwargs)
File "/workspace/PaddleNLP/llm/predict/predictor.py", line 1525, in predict_dy_insert
self._infer(self.model_inputs)
File "/usr/local/lib/python3.10/dist-packages/decorator.py", line 232, in fun
return caller(func, *(extras + args), **kw)
File "/usr/local/lib/python3.10/dist-packages/paddle/base/dygraph/base.py", line 396, in _decorate_function
return func(*args, **kwargs)
File "/workspace/PaddleNLP/llm/predict/predictor.py", line 1172, in _infer
return self.model.generate(
File "/usr/local/lib/python3.10/dist-packages/decorator.py", line 232, in fun
return caller(func, *(extras + args), **kw)
File "/usr/local/lib/python3.10/dist-packages/paddle/base/dygraph/base.py", line 396, in _decorate_function
return func(*args, **kwargs)
File "/workspace/PaddleNLP/paddlenlp/experimental/transformers/generation_utils.py", line 669, in generate
ret = self.sample(
File "/workspace/PaddleNLP/paddlenlp/experimental/transformers/generation_utils.py", line 777, in sample
outputs = forward(**model_kwargs) # [bs, 1, dim_embed]
File "/workspace/PaddleNLP/paddlenlp/experimental/transformers/generation_utils.py", line 692, in forward
return self(**model_inputs)
File "/usr/local/lib/python3.10/dist-packages/paddle/nn/layer/layers.py", line 1571, in call
return self.forward(*inputs, **kwargs)
File "/workspace/PaddleNLP/paddlenlp/experimental/transformers/qwen2/modeling.py", line 1553, in forward
hidden_states, full_hidden_states = self.qwen2(
File "/usr/local/lib/python3.10/dist-packages/paddle/nn/layer/layers.py", line 1571, in call
return self.forward(*inputs, **kwargs)
File "/workspace/PaddleNLP/paddlenlp/experimental/transformers/qwen2/modeling.py", line 1327, in forward
hidden_states, full_hidden_states = self.transformer_block(
File "/usr/local/lib/python3.10/dist-packages/paddle/nn/layer/layers.py", line 1571, in call
return self.forward(*inputs, **kwargs)
File "/workspace/PaddleNLP/paddlenlp/experimental/transformers/fused_transformer_layers.py", line 1486, in forward
assert len(caches) == len(self.linear_weights) or len(caches) == 2 * len(self.linear_weights)
AssertionError

Liujie0926 · 2025-08-20T11:28:00Z

Test任务网络问题已修复，辛苦merge下develop代码

…kv_cache

Li-Z-Q · 2025-08-21T02:41:19Z

Test任务网络问题已修复，辛苦merge下develop代码

已merge

Li-Z-Q · 2025-08-21T02:42:17Z

[PaddleNLP-CI]任务执行失败，手动验证发现pr代码执行grpo的case会报错。手动复现命令如下： cd PaddleNLP/llm/alignment/rl/ python reward/reward_server.py >log_reward 2>&1 & #服务启动 export PYTHONPATH=PaddleNLP/:$PYTHONPATH export PYTHONPATH=PaddleNLP/llm:$PYTHONPATH python -u -m paddle.distributed.launch --devices "0,1,2,3" run_rl.py ../../config/qwen/grpo_argument.yaml

报错信息 Traceback (most recent call last): File "/workspace/PaddleNLP/llm/alignment/rl/run_rl.py", line 453, in main() File "/workspace/PaddleNLP/llm/alignment/rl/run_rl.py", line 434, in main train_result = trainer.train(resume_from_checkpoint=checkpoint) File "/workspace/PaddleNLP/paddlenlp/rl/trainer/ppo_trainer.py", line 1397, in train generated_batches: List[DataProto] = self.actor_trainer.generate_sequences( File "/usr/local/lib/python3.10/dist-packages/decorator.py", line 232, in fun return caller(func, *(extras + args), **kw) File "/usr/local/lib/python3.10/dist-packages/paddle/base/dygraph/base.py", line 396, in _decorate_function return func(*args, **kwargs) File "/workspace/PaddleNLP/paddlenlp/rl/trainer/actor_trainer.py", line 414, in generate_sequences sequences = self.get_model(False).generate( File "/workspace/PaddleNLP/paddlenlp/rl/utils/infer_utils.py", line 315, in generate outputs = policy_predictor.predict(input_ids=input_ids, repeat_num=repeat_num, **kwargs) File "/usr/local/lib/python3.10/dist-packages/decorator.py", line 232, in fun return caller(func, *(extras + args), **kw) File "/usr/local/lib/python3.10/dist-packages/paddle/base/dygraph/base.py", line 396, in _decorate_function return func(*args, **kwargs) File "/workspace/PaddleNLP/paddlenlp/rl/utils/infer_utils.py", line 98, in predict outputs = self.predict_dy_insert( File "/usr/local/lib/python3.10/dist-packages/decorator.py", line 232, in fun return caller(func, *(extras + args), **kw) File "/usr/local/lib/python3.10/dist-packages/paddle/base/dygraph/base.py", line 396, in _decorate_function return func(*args, **kwargs) File "/workspace/PaddleNLP/paddlenlp/utils/import_utils.py", line 105, in wrapper return func(self, *args, **kwargs) File "/workspace/PaddleNLP/llm/predict/predictor.py", line 1525, in predict_dy_insert self._infer(self.model_inputs) File "/usr/local/lib/python3.10/dist-packages/decorator.py", line 232, in fun return caller(func, *(extras + args), **kw) File "/usr/local/lib/python3.10/dist-packages/paddle/base/dygraph/base.py", line 396, in _decorate_function return func(*args, **kwargs) File "/workspace/PaddleNLP/llm/predict/predictor.py", line 1172, in _infer return self.model.generate( File "/usr/local/lib/python3.10/dist-packages/decorator.py", line 232, in fun return caller(func, *(extras + args), **kw) File "/usr/local/lib/python3.10/dist-packages/paddle/base/dygraph/base.py", line 396, in _decorate_function return func(*args, **kwargs) File "/workspace/PaddleNLP/paddlenlp/experimental/transformers/generation_utils.py", line 669, in generate ret = self.sample( File "/workspace/PaddleNLP/paddlenlp/experimental/transformers/generation_utils.py", line 777, in sample outputs = forward(**model_kwargs) # [bs, 1, dim_embed] File "/workspace/PaddleNLP/paddlenlp/experimental/transformers/generation_utils.py", line 692, in forward return self(**model_inputs) File "/usr/local/lib/python3.10/dist-packages/paddle/nn/layer/layers.py", line 1571, in call return self.forward(*inputs, **kwargs) File "/workspace/PaddleNLP/paddlenlp/experimental/transformers/qwen2/modeling.py", line 1553, in forward hidden_states, full_hidden_states = self.qwen2( File "/usr/local/lib/python3.10/dist-packages/paddle/nn/layer/layers.py", line 1571, in call return self.forward(*inputs, **kwargs) File "/workspace/PaddleNLP/paddlenlp/experimental/transformers/qwen2/modeling.py", line 1327, in forward hidden_states, full_hidden_states = self.transformer_block( File "/usr/local/lib/python3.10/dist-packages/paddle/nn/layer/layers.py", line 1571, in call return self.forward(*inputs, **kwargs) File "/workspace/PaddleNLP/paddlenlp/experimental/transformers/fused_transformer_layers.py", line 1486, in forward assert len(caches) == len(self.linear_weights) or len(caches) == 2 * len(self.linear_weights) AssertionError

已通过修改kv_cache_reuse默认值进行修复

DrownFish19

LGTM

* fix hidden states * fix quant kv_cache * fix Lint style * fix kv_cache_reuse key error * fix kv_cache_reuse key error * remove unused code * fix kv_cache_reuse default

Li-Z-Q added 2 commits July 17, 2025 13:40

fix hidden states

e191904

fix quant kv_cache

3352aea

paddle-bot bot added the contributor label Jul 17, 2025

paddle-bot bot assigned KB-Ding Jul 17, 2025

Li-Z-Q added 4 commits July 17, 2025 16:34

fix Lint style

305835c

Merge branch 'PaddlePaddle:develop' into fix_hidden_states_and_quant_…

886a2fa

…kv_cache

Merge branch 'PaddlePaddle:develop' into fix_hidden_states_and_quant_…

50df9f8

…kv_cache

Merge branch 'PaddlePaddle:develop' into fix_hidden_states_and_quant_…

fcc1b20

…kv_cache

DrownFish19 reviewed Aug 13, 2025

View reviewed changes

DrownFish19 added the Beijing Innovation Consortium label Aug 13, 2025

Li-Z-Q added 4 commits August 13, 2025 15:51

fix kv_cache_reuse key error

b7425dc

fix kv_cache_reuse key error

d1742a8

Merge branch 'PaddlePaddle:develop' into fix_hidden_states_and_quant_…

7857772

…kv_cache

remove unused code

9aa280a

Li-Z-Q added 2 commits August 21, 2025 10:23

Merge branch 'PaddlePaddle:develop' into fix_hidden_states_and_quant_…

bdfafbe

…kv_cache

fix kv_cache_reuse default

7ea50cf

DrownFish19 approved these changes Aug 21, 2025

View reviewed changes

DrownFish19 merged commit c83684f into PaddlePaddle:develop Aug 21, 2025
9 of 10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix hidden states and quant kv cache #10854

Fix hidden states and quant kv cache #10854

Uh oh!

Li-Z-Q commented Jul 17, 2025

Uh oh!

paddle-bot bot commented Jul 17, 2025

Uh oh!

DrownFish19 Aug 13, 2025

Uh oh!

Liujie0926 commented Aug 20, 2025

Uh oh!

Liujie0926 commented Aug 20, 2025

Uh oh!

Li-Z-Q commented Aug 21, 2025

Uh oh!

Li-Z-Q commented Aug 21, 2025

Uh oh!

DrownFish19 left a comment

Uh oh!

Uh oh!

Uh oh!

Fix hidden states and quant kv cache #10854

Fix hidden states and quant kv cache #10854

Uh oh!

Conversation

Li-Z-Q commented Jul 17, 2025

Uh oh!

paddle-bot bot commented Jul 17, 2025

Uh oh!

DrownFish19 Aug 13, 2025

Choose a reason for hiding this comment

Uh oh!

Liujie0926 commented Aug 20, 2025

Uh oh!

Liujie0926 commented Aug 20, 2025

Uh oh!

Li-Z-Q commented Aug 21, 2025

Uh oh!

Li-Z-Q commented Aug 21, 2025

Uh oh!

DrownFish19 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!